This notebook will introduce you to data module.


In [1]:
from disciplines.data import data
print data.__doc__


This package deals with number of data issues

The package wraps up different modules that can be cattegorized as:
	- Links to data
	- Getting data
	- Formatting/structuring data

A number of functions are created to get data from online repositories. These functions are in two profession_types:
ready data - one that is formated well and prepared for usage, like CSV, XML and etc. And scrapping techniques.
Data is being transformed into one of our 

The data is stored and manipulated in three data types:
	- Relational database - is used for storing large quantities of that have well defined form and structure.
	- Ontology - based on networkx.
	- Pandas arrays and dataframes.

Dependencies:
	sqlite3
	pandas
	networkx

Here are few ideas of what kind of functions we would like to develop here.


In [2]:
#data.listall() #describes available data
#data.update() #updates data by cheking into resources
#data.backup() #writes a snapshot o data

#data.urls() # lists all known data urls and what kind of information is there
#data.datasets()  # lists all 
#data.build_ontology() # build ontology out of data stored in sql.

#data.disciplines() #lists all disciplines
#data.persons() #lists all persons
#data.events() #lists all events
#data.learned_societies() #lists all learned societies

Further, what we will do here is do merging of different data

Linking data

Here we will link data.

Getting data

few ways of getting dta

Formating/restructuring data

Will be done with python tools


In [ ]: